Large Scale Clustering of Dependent Curves

نویسندگان

  • Huijing Jiang
  • Nicoleta Serban
چکیده

In this paper, we introduce a model-based method for clustering multiple curves or functionals under spatial dependence specified up to a set of unknown parameters. The functionals are decomposed using a semi-parametric model where the fixed effects account for the large-scale clustering association and the random effects for the small scale spatialdependence variability. The clustering model assumes the clustering membership as a realization from a Markov random field. Within our estimation framework, the emphasis is on a large number of functionals/spatial units with sparsely sampled time points. To overcome the computational cost resulting from large dependence matrix operations, the estimation algorithm includes a two-stage approximation: low-ranked kernel-based decomposition of the dependence matrix and Incomplete Choslesky Factorization of the kernel matrix. We assess the performance of our clustering approach within a simulation study. The simulation results show enhanced clustering estimation accuracy of our method compared with other existing model-based clustering methods under a series of settings: small number of time points, low signal-to-noise ratio and different spatial dependence structures. Many case studies will fall within our clustering framework, but we focus on obtaining fine-grid spatial clusters for demographics trends including ethnicity and income for five southern states of US over the past 11 years.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A partition-based algorithm for clustering large-scale software systems

Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...

متن کامل

Online Aggregation of Coherent Generators Based on Electrical Parameters of Synchronous Generators

This paper proposes a novel approach for coherent generators online clustering in a large power system following a wide area disturbance. An interconnected power system may become unstable due to severe contingency when it is operated close to the stability boundaries. Hence, the bulk power system controlled islanding is the last resort to prevent catastrophic cascading outages and wide area bl...

متن کامل

Centralized Clustering Method To Increase Accuracy In Ontology Matching Systems

Ontology is the main infrastructure of the Semantic Web which provides facilities for integration, searching and sharing of information on the web. Development of ontologies as the basis of semantic web and their heterogeneities have led to the existence of ontology matching. By emerging large-scale ontologies in real domain, the ontology matching systems faced with some problem like memory con...

متن کامل

تجمع بیماری در مقیاسی وسیع و کاربرد آن در مطالعات اپیدمیولوژی و بهداشت

Spatial autocorrelation statistics provide summary information about the spatial arrangement of data in a map. In fact, these statistics compare neighboring area values in order to assess the level of large scale clustering. Whenever a large number of neighboring areas have either relatively large or relatively small values, large scale clustering may be detected. Detecting such clustering is a...

متن کامل

Predictive Modeling of Large-scale Curves and Its Application on GDP Prediction of Multi-regions

Traditional approach to predict large-scale sequential curves is to build model separately according to every curve, which causes heavy and complicated modeling workload inevitably. Therefore the existing approach is lack of manipuility in the application. A new method is proposed in this paper to solve this problem. By reducing model types of curves, clustering curves and modeling by clusters,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008